What kind of corpus did I choose and why?

The corpus that I use for my portfolio will consist of 100 opera songs and 100 musical songs, collected from pre-existing playlists that are available on Spotify. I use Spotify’s Opera 100: Spotify Picks playlist as a basis for my opera playlist. This playlist consists of 97 tracks, so I have added 3 tracks manually, based on Spotify’s suggestions. For my musical playlist, I use a public playlist called BROADWAY MUSICALS, made by Hugo Torres. Out of all musical playlists I could find, I found this to be the most inclusive. Furthermore, I chose to focus on Broadway musicals because they were written to be performed in front of a live audience, just like operas. The original playlist consists of 150 songs, so I manually removed 50 tracks. I chose to remove tracks from musicals which had more tracks in the playlist, in order to create a playlist with as much different musicials as possible.
I chose this corpus because I have always been fascinated by musicals. More recently, I was introduced to operas and I recognized the same compelling drama I appreciate in musicals. Opera songs and musical songs both mainly serve to tell a story, but have very different styles. I wonder if opera and musical music share certain aspects, because they both have such a strong narrative function.  

Natural comparison points
In comparing opera tracks with musical tracks, I expect to find a difference in tempo and danceability. Furthermore, I wonder if opera songs are sadder than musical songs, which might be reflected in the valence. I am curious to find out if the energy and loudness differ between the groups. I expect the liveness, intrumentalness, and speechiness to be similar, because most songs in the corpus are studio recordings and contain vocals.  

Weaknesses of the corpus
Because adding music from every opera and musical would create a very big corpus, tracks from some operas and musicals are essentially missing (also because most operas and musicals have had a lot of productions with different artists/conductors/musicians). This means that my corpus does not cover the whole genre. Furthermore, Spotify’s pre-existing playlists generally include only the well-known (classical) operas and musicals, leaving out smaller productions.  

Typical tracks
Habanera – Carmen: for me, this is a typical opera song with very high notes that everyone knows.
La donne e mobile – Rigoletto: again, this is a very famous song. I think the grandeur of this song is typical for opera music.
One Day More - Les Miserables: this song is very dramatic and has multiple singers, which is typical for musical songs.
You Can’t Stop The Beat - Hairspray: the happiness and danceability of this song is typical for musical songs.  

Atypical tracks
Summertime - Porgy and Bess: this is a jazzy song, which is a different genre than most opera songs.
Ride of the Valkyries - Die Walkure: this song has no lyrics, which is atypical for an opera song.
Totally Fucked - Spring Awakening: this comes close to a rock song, which is a different genre than most musical songs.
Land of Lola - Kinky Boots: this song has a strong disco vibe, with more use of electronic instruments than the average musical song.

Are musical songs really happier than opera songs?


I started by visualizing the distribution of valence against energy for both musical and opera tracks. Furthermore, this plot shows the danceability and tempo of the songs in my corpus.

As I expected, musical tracks cover a wider range of valence and energy than opera tracks. I think this might be because musicals can be written in a variety of different styles/genres, whereas operas often have the same style. I was surprised by the fact that the graph shows so little difference between opera tracks, I would’ve expected at least a little more variety. This plot strongly suggests that almost all opera songs in my playlist are very sad.

Furthermore, as expected, musical songs are generally more danceable than opera songs. It is interesting that musical tracks seem to have a linear relationship: the more energy a track has, the higher the valence. This would mean that there are not many relaxed or angry musical songs.

Take a look at the clear outliers in both groups. Memory (from the musical Cats) is a particular sad musical song, while Stizzoso, mio stizzoso, voi fate il borioso (from the opera La Serva Padrona) is a particular happy opera song (and also turns out to be the most danceable opera song).

Because the opera songs are not distributed in an even way, it is hard to see the data. That’s why I decided to take a closer look.

Take a closer look on the opera playlist


Take a look on a zoomed in version of the opera plot you saw in the previous tab. In this plot, another outlier stands out: Et maintenant je dois offrir (from the opera Les Huguenots) has the highest energy of all opera songs in the corpus. I manually chose different x and y limits for this plot, so the datapoints are somewhat clearer now. However, they are still very much clustered together in the low-Energy-low-Valence corner. This means that almost all opera songs in my corpus are sad.

When you look at Spotify’s explanation of energy, you find this:

“Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically,energetic tracks feel fast, loud, and noisy. For example, death metal has high energy, while a Bach prelude scores low on the scale”.

Given this definition, I think it is not surprising that opera songs are low in energy. However, I expected to find more songs with higher energy because opera songs can be very temperamental, which I thought would be reflected in the energy values. It could also very well be the case that these temperamental songs are just not represented well in my corpus.

As for the low valence values for opera songs: I think this can be explained by the fact that many operas are tragedies, so using sad songs in these stories makes total sense.

What’s so special about “Stizzoso, mio stizzoso, voi fate il borioso”?


Here, you see a chomagram of the song “Stizzoso, mio stizzoso, voi fate il borioso” (from the opera La Serva Padrona). The norm for the chroma vectors used here is euclidean. I chose this norm because it made the patterns in the chromagram the clearest of all three possible norms.

What you can tell from this chromagram is that the song mainly jumps from E to A. You can here these jumps quite well in the song (you can listen to it here). Other than that, it is quite hard to tell what the chromagram represents, because it’s kind of all over the place.

In order to make the analyisis of this song more meaningful, I compared its chromagram to the chromagram of a very average song in my corpus: “Gluck, das mir verblieb” (from the opera Die Tote Stadt). You can find this chromagram in the next tab (and listen to it here).

What you can see in this chromagram is that the two mainly used pitches in the song (F and A#) are much clearer. Maybe this means that “Stizzoso, mio stizzoso, voi fate il borioso” just has more different pitches in it.

When you listen to the two songs, it makes sense that Spotify classifies “Stizzoso, mio stizzoso, voi fate il borioso” to be much happier than “Gluck, das mir verblieb”. The former has lots of happy violins in it, while the latter is very melodramatic.

What does an average opera track look like? Chromagram of the song “Gluck, das mir verblieb”


Here, you see a chomagram of the song “Gluck, das mir verblieb” (from the opera Die Tote Stadt). The norm for the chroma vectors used here is euclidean.